85 research outputs found
The Weight Function in the Subtree Kernel is Decisive
Tree data are ubiquitous because they model a large variety of situations,
e.g., the architecture of plants, the secondary structure of RNA, or the
hierarchy of XML files. Nevertheless, the analysis of these non-Euclidean data
is difficult per se. In this paper, we focus on the subtree kernel that is a
convolution kernel for tree data introduced by Vishwanathan and Smola in the
early 2000's. More precisely, we investigate the influence of the weight
function from a theoretical perspective and in real data applications. We
establish on a 2-classes stochastic model that the performance of the subtree
kernel is improved when the weight of leaves vanishes, which motivates the
definition of a new weight function, learned from the data and not fixed by the
user as usually done. To this end, we define a unified framework for computing
the subtree kernel from ordered or unordered trees, that is particularly
suitable for tuning parameters. We show through eight real data classification
problems the great efficiency of our approach, in particular for small
datasets, which also states the high importance of the weight function.
Finally, a visualization tool of the significant features is derived.Comment: 36 page
Optimal choice among a class of nonparametric estimators of the jump rate for piecewise-deterministic Markov processes
A piecewise-deterministic Markov process is a stochastic process whose
behavior is governed by an ordinary differential equation punctuated by random
jumps occurring at random times. We focus on the nonparametric estimation
problem of the jump rate for such a stochastic model observed within a long
time interval under an ergodicity condition. We introduce an uncountable class
(indexed by the deterministic flow) of recursive kernel estimates of the jump
rate and we establish their strong pointwise consistency as well as their
asymptotic normality. We propose to choose among this class the estimator with
the minimal variance, which is unfortunately unknown and thus remains to be
estimated. We also discuss the choice of the bandwidth parameters by
cross-validation methods.Comment: 36 pages, 18 figure
Integral estimation based on Markovian design
Suppose that a mobile sensor describes a Markovian trajectory in the ambient
space. At each time the sensor measures an attribute of interest, e.g., the
temperature. Using only the location history of the sensor and the associated
measurements, the aim is to estimate the average value of the attribute over
the space. In contrast to classical probabilistic integration methods, e.g.,
Monte Carlo, the proposed approach does not require any knowledge on the
distribution of the sensor trajectory. Probabilistic bounds on the convergence
rates of the estimator are established. These rates are better than the
traditional "root n"-rate, where n is the sample size, attached to other
probabilistic integration methods. For finite sample sizes, the good behaviour
of the procedure is demonstrated through simulations and an application to the
evaluation of the average temperature of oceans is considered.Comment: 45 page
Nonparametric estimation of the conditional distribution of the inter-jumping times for piecewise-deterministic Markov processes
This paper presents a nonparametric method for estimating the conditional
density associated to the jump rate of a piecewise-deterministic Markov
process. In our framework, the estimation needs only one observation of the
process within a long time interval. Our method relies on a generalization of
Aalen's multiplicative intensity model. We prove the uniform consistency of our
estimator, under some reasonable assumptions related to the primitive
characteristics of the process. A simulation example illustrates the behavior
of our estimator
Detection of Common Subtrees with Identical Label Distribution
Frequent pattern mining is a relevant method to analyse structured data, like
sequences, trees or graphs. It consists in identifying characteristic
substructures of a dataset. This paper deals with a new type of patterns for
tree data: common subtrees with identical label distribution. Their detection
is far from obvious since the underlying isomorphism problem is graph
isomorphism complete. An elaborated search algorithm is developed and analysed
from both theoretical and numerical perspectives. Based on this, the
enumeration of patterns is performed through a new lossless compression scheme
for trees, called DAG-RW, whose complexity is investigated as well. The method
shows very good properties, both in terms of computation times and analysis of
real datasets from the literature. Compared to other substructures like
topological subtrees and labelled subtrees for which the isomorphism problem is
linear, the patterns found provide a more parsimonious representation of the
data.Comment: 40 page
The Weight Function in the Subtree Kernel is Decisive
Tree data are ubiquitous because they model a large variety of situations,
e.g., the architecture of plants, the secondary structure of RNA, or the
hierarchy of XML files. Nevertheless, the analysis of these non-Euclidean data
is difficul per se. In this paper, we focus on the subtree kernel that is a
convolution kernel for tree data introduced by Vishwanathan and Smola in the
early 2000's. More precisely, we investigate the influence of the weight
function from a theoretical perspective and in real data applications. We
establish on a 2-classes stochastic model that the performance of the subtree
kernel is improved when the weight of leaves vanishes, which motivates the
definition of a new weight function, learned from the data and not fixed by the
user as usually done. To this end, we define a unified framework for computing
the subtree kernel from ordered or unordered trees, that is particularly
suitable for tuning parameters. We show through two real data classification
problems the great efficiency of our approach, in particular with respect to
the ones considered in the literature, which also states the high importance of
the weight function. Finally, a visualization tool of the significant features
is derived.Comment: 28 page
Estimation of Piecewise-Deterministic Trajectories in a Quantum Optics Scenario
International audienceThe manipulation of individual copies of quantum systems is one of the most groundbreaking experimental discoveries in the field of quantum physics. On both an experimental and a theoretical level, it has been shown that the dynamics of a single copy of an open quantum system is a trajectory of a piecewise-deterministic process. To the best of our knowledge, this application field has not been explored by the literature in applied mathematics, from both probabilistic and statistical perspectives. The objective of this chapter is to provide a self-contained presentation of this kind of model, as well as its specificities in terms of observations scheme of the system, and a first attempt to deal with a statistical issue that arises in the quantum world
Estimation non paramétrique optimale du taux de saut d'un processus markovien déterministe par morceaux
International audienceUn processus markovien déterministe par morceaux est un processus stochastique dont la trajectoire est décrite par une équation différentielle perturbée par des sauts aléatoires en des instants aléatoires. Nous nous intéressons à l'estimation du taux de saut d'un tel processus observé en temps long sous une hypothèse d'ergodicité. Nous introduisons une classe d'estimateurs non paramétriques consistants et asymptotiquement gaussiens. Nous proposons de choisir l'estimateur de variance minimale, variance qui est elle-même à estimer
Optimal choice among a class of nonparametric estimators of the jump rate for piecewise-deterministic Markov processes
International audienceA piecewise-deterministic Markov process is a stochastic process whose behavior is governed by an ordinary differential equation punctuated by random jumps occurring at random times. We focus on the nonparametric estimation problem of the jump rate for such a stochastic model observed within a long time interval under an ergodicity condition. We introduce an uncountable class (indexed by the deterministic flow) of recursive kernel estimates of the jump rate and we establish their strong pointwise consistency as well as their asymptotic normality. We propose to choose among this class the estimator with the minimal variance, which is unfortunately unknown and thus remains to be estimated. We also discuss the choice of the bandwidth parameters by cross-validation methods
- …